52 research outputs found
Pure Message Passing Can Estimate Common Neighbor for Link Prediction
Message Passing Neural Networks (MPNNs) have emerged as the {\em de facto}
standard in graph representation learning. However, when it comes to link
prediction, they often struggle, surpassed by simple heuristics such as Common
Neighbor (CN). This discrepancy stems from a fundamental limitation: while
MPNNs excel in node-level representation, they stumble with encoding the joint
structural features essential to link prediction, like CN. To bridge this gap,
we posit that, by harnessing the orthogonality of input vectors, pure
message-passing can indeed capture joint structural features. Specifically, we
study the proficiency of MPNNs in approximating CN heuristics. Based on our
findings, we introduce the Message Passing Link Predictor (MPLP), a novel link
prediction model. MPLP taps into quasi-orthogonal vectors to estimate
link-level structural features, all while preserving the node-level
complexities. Moreover, our approach demonstrates that leveraging
message-passing to capture structural features could offset MPNNs'
expressiveness limitations at the expense of estimation variance. We conduct
experiments on benchmark datasets from various domains, where our method
consistently outperforms the baseline methods.Comment: preprin
A Survey of Multi-task Learning in Natural Language Processing: Regarding Task Relatedness and Training Methods
Multi-task learning (MTL) has become increasingly popular in natural language
processing (NLP) because it improves the performance of related tasks by
exploiting their commonalities and differences. Nevertheless, it is still not
understood very well how multi-task learning can be implemented based on the
relatedness of training tasks. In this survey, we review recent advances of
multi-task learning methods in NLP, with the aim of summarizing them into two
general multi-task training methods based on their task relatedness: (i) joint
training and (ii) multi-step training. We present examples in various NLP
downstream applications, summarize the task relationships and discuss future
directions of this promising topic.Comment: Accepted to EACL 2023 as regular long pape
What indeed can GPT models do in chemistry? A comprehensive benchmark on eight tasks
Large Language Models (LLMs) with strong abilities in natural language
processing tasks have emerged and have been rapidly applied in various kinds of
areas such as science, finance and software engineering. However, the
capability of LLMs to advance the field of chemistry remains unclear. In this
paper,we establish a comprehensive benchmark containing 8 practical chemistry
tasks, including 1) name prediction, 2) property prediction, 3) yield
prediction, 4) reaction prediction, 5) retrosynthesis (prediction of reactants
from products), 6)text-based molecule design, 7) molecule captioning, and 8)
reagent selection. Our analysis draws on widely recognized datasets including
BBBP, Tox21, PubChem, USPTO, and ChEBI, facilitating a broad exploration of the
capacities of LLMs within the context of practical chemistry. Three GPT models
(GPT-4, GPT-3.5,and Davinci-003) are evaluated for each chemistry task in
zero-shot and few-shot in-context learning settings with carefully selected
demonstration examples and specially crafted prompts. The key results of our
investigation are 1) GPT-4 outperforms the other two models among the three
evaluated; 2) GPT models exhibit less competitive performance in tasks
demanding precise understanding of molecular SMILES representation, such as
reaction prediction and retrosynthesis;3) GPT models demonstrate strong
capabilities in text-related explanation tasks such as molecule captioning; and
4) GPT models exhibit comparable or better performance to classical machine
learning models when applied to chemical problems that can be transformed into
classification or ranking tasks, such as property prediction, and yield
prediction
Flashlight: Scalable Link Prediction with Effective Decoders
Link prediction (LP) has been recognized as an important task in graph
learning with its broad practical applications. A typical application of LP is
to retrieve the top scoring neighbors for a given source node, such as the
friend recommendation. These services desire the high inference scalability to
find the top scoring neighbors from many candidate nodes at low latencies.
There are two popular decoders that the recent LP models mainly use to compute
the edge scores from node embeddings: the HadamardMLP and Dot Product decoders.
After theoretical and empirical analysis, we find that the HadamardMLP decoders
are generally more effective for LP. However, HadamardMLP lacks the scalability
for retrieving top scoring neighbors on large graphs, since to the best of our
knowledge, there does not exist an algorithm to retrieve the top scoring
neighbors for HadamardMLP decoders in sublinear complexity. To make HadamardMLP
scalable, we propose the Flashlight algorithm to accelerate the top scoring
neighbor retrievals for HadamardMLP: a sublinear algorithm that progressively
applies approximate maximum inner product search (MIPS) techniques with
adaptively adjusted query embeddings. Empirical results show that Flashlight
improves the inference speed of LP by more than 100 times on the large
OGBL-CITATION2 dataset without sacrificing effectiveness. Our work paves the
way for large-scale LP applications with the effective HadamardMLP decoders by
greatly accelerating their inference
Boosting Graph Neural Networks via Adaptive Knowledge Distillation
Graph neural networks (GNNs) have shown remarkable performance on diverse
graph mining tasks. Although different GNNs can be unified as the same message
passing framework, they learn complementary knowledge from the same graph.
Knowledge distillation (KD) is developed to combine the diverse knowledge from
multiple models. It transfers knowledge from high-capacity teachers to a
lightweight student. However, to avoid oversmoothing, GNNs are often shallow,
which deviates from the setting of KD. In this context, we revisit KD by
separating its benefits from model compression and emphasizing its power of
transferring knowledge. To this end, we need to tackle two challenges: how to
transfer knowledge from compact teachers to a student with the same capacity;
and, how to exploit student GNN's own strength to learn knowledge. In this
paper, we propose a novel adaptive KD framework, called BGNN, which
sequentially transfers knowledge from multiple GNNs into a student GNN. We also
introduce an adaptive temperature module and a weight boosting module. These
modules guide the student to the appropriate knowledge for effective learning.
Extensive experiments have demonstrated the effectiveness of BGNN. In
particular, we achieve up to 3.05% improvement for node classification and
6.35% improvement for graph classification over vanilla GNNs
Link Prediction with Non-Contrastive Learning
A recent focal area in the space of graph neural networks (GNNs) is graph
self-supervised learning (SSL), which aims to derive useful node
representations without labeled data. Notably, many state-of-the-art graph SSL
methods are contrastive methods, which use a combination of positive and
negative samples to learn node representations. Owing to challenges in negative
sampling (slowness and model sensitivity), recent literature introduced
non-contrastive methods, which instead only use positive samples. Though such
methods have shown promising performance in node-level tasks, their suitability
for link prediction tasks, which are concerned with predicting link existence
between pairs of nodes (and have broad applicability to recommendation systems
contexts) is yet unexplored. In this work, we extensively evaluate the
performance of existing non-contrastive methods for link prediction in both
transductive and inductive settings. While most existing non-contrastive
methods perform poorly overall, we find that, surprisingly, BGRL generally
performs well in transductive settings. However, it performs poorly in the more
realistic inductive settings where the model has to generalize to links to/from
unseen nodes. We find that non-contrastive models tend to overfit to the
training graph and use this analysis to propose T-BGRL, a novel non-contrastive
framework that incorporates cheap corruptions to improve the generalization
ability of the model. This simple modification strongly improves inductive
performance in 5/6 of our datasets, with up to a 120% improvement in
Hits@50--all with comparable speed to other non-contrastive baselines and up to
14x faster than the best-performing contrastive baseline. Our work imparts
interesting findings about non-contrastive learning for link prediction and
paves the way for future researchers to further expand upon this area.Comment: ICLR 2023. 19 pages, 6 figure
Linkless Link Prediction via Relational Distillation
Graph Neural Networks (GNNs) have shown exceptional performance in the task
of link prediction. Despite their effectiveness, the high latency brought by
non-trivial neighborhood data dependency limits GNNs in practical deployments.
Conversely, the known efficient MLPs are much less effective than GNNs due to
the lack of relational knowledge. In this work, to combine the advantages of
GNNs and MLPs, we start with exploring direct knowledge distillation (KD)
methods for link prediction, i.e., predicted logit-based matching and node
representation-based matching. Upon observing direct KD analogs do not perform
well for link prediction, we propose a relational KD framework, Linkless Link
Prediction (LLP), to distill knowledge for link prediction with MLPs. Unlike
simple KD methods that match independent link logits or node representations,
LLP distills relational knowledge that is centered around each (anchor) node to
the student MLP. Specifically, we propose rank-based matching and
distribution-based matching strategies that complement each other. Extensive
experiments demonstrate that LLP boosts the link prediction performance of MLPs
with significant margins, and even outperforms the teacher GNNs on 7 out of 8
benchmarks. LLP also achieves a 70.68x speedup in link prediction inference
compared to GNNs on the large-scale OGB dataset
Graph-based Molecular Representation Learning
Molecular representation learning (MRL) is a key step to build the connection
between machine learning and chemical science. In particular, it encodes
molecules as numerical vectors preserving the molecular structures and
features, on top of which the downstream tasks (e.g., property prediction) can
be performed. Recently, MRL has achieved considerable progress, especially in
methods based on deep molecular graph learning. In this survey, we
systematically review these graph-based molecular representation techniques,
especially the methods incorporating chemical domain knowledge. Specifically,
we first introduce the features of 2D and 3D molecular graphs. Then we
summarize and categorize MRL methods into three groups based on their input.
Furthermore, we discuss some typical chemical applications supported by MRL. To
facilitate studies in this fast-developing area, we also list the benchmarks
and commonly used datasets in the paper. Finally, we share our thoughts on
future research directions
- …